System and method for optimizing iterative circuit for cyclic redundancy check (crc) calculation

ABSTRACT

A system for generating CRC code words associated with data ranging up to w-bytes in width to be communicated over a communications channel includes a first plurality of serially coupled code-generation blocks each for generating a CRC value based on data input to each block, respective blocks of the first plurality configured for receiving data inputs having respective byte widths ranging from 2 N +M to 2 N−L +M, where N is equal to log 2  (w), and M is an offset value, and L is a whole number based on a maximum propagation delay criteria; a second plurality of parallel coupled code-generation blocks each for generating a CRC value based on data inputs, respective blocks of the second plurality configured for receiving data having respective byte widths ranging from 2 N−L  â□□1+M to 2 0 ; and, a device for selecting particular CRC code generation blocks in the first and second pluralities to be included in a CRC calculation based on the data input; wherein any number of data input bytes may be processed.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates generally to the implementation ofpacket-based cyclic redundancy checks in communications systems, andmore particularly to an iterative circuit for performing and timeoptimizing a cyclic redundancy check calculation in a communicationssystem.

[0003] 2. Description of the Prior Art

[0004] Many packet-based communications protocols use code wordsappended to the packet transmission to check for the presence of errorsintroduced in the communications channel. One commonly used scheme forgenerating such code words is Cyclic Redundancy Check (CRC). Thetransmitter appends a CRC code word to the end of the packet, while thereceiver recalculates the CRC for the entire packet, including the codeword. Several CRC schemes are in common use; the various schemes usedifferent polynomials for the calculation, and differ in the resultingcode word length.

[0005] For a packet transmitted over a serial data stream, the logiccircuitry required to calculate the CRC code word in the transmitter orthe receiver is well-known and very efficient. A Linear Feedback ShiftRegister, with exclusive-OR gates as needed to implement the targetpolynomial, is a sufficient implementation. Each state of the shiftregister is calculated based on the current serial bit and the previousstate of the shift register. So for a serial data stream, n latches(where n is the order of the polynomial) and a few exclusive-OR gates isthe extent of the circuitry required.

[0006] However, high-speed serial data interfaces (e.g., 10 Gbps, 40Gbps or above interfaces) often require more expensive technologies(such as SiGe (Silicon Germanium)) to implement data signals at serialbaud rates. Such interfaces use high-speed analog circuits to implementthe high-speed interfaces, and typically multiplex/demultiplex datato/from the serial interface into slower parallel data paths forprocessing within CMOS chips. Therefore, the CRC calculation circuitmore typically operates on a parallel data bus. If the data bus is“w”-bytes wide, then the CRC calculation must simultaneously processw-bytes to determine the next state of the CRC calculation. Furthermore,since the next state of the CRC calculation is based on the previousstate of the calculation, the calculation does not lend itself topipelining.

[0007] A further complexity is introduced when the packet data is notguaranteed to be an integral number of w-bytes, and/or is not guaranteedto be start/stop in aligned locations on the parallel data bus. Forexample, given a 32-byte wide data bus, a CRC calculation circuit musttherefore be capable of handling any of the possible resultingcalculation widths: w=1, 2, 3, 4, . . . ,31, 32 bytes. This makes thenext state decode for the CRC calculation significantly more complex.The resulting logic circuit may require a significant amount of chiparea. Furthermore, since this chip area is primarily consumed bycombinatorial logic with large fanout connections, wirability and timingissues may result.

[0008] In order to meet system requirements, the CRC calculation logicmust typically consist of multiple CRC calculation blocks of variouswidths, with data steering to select data into each block to be used onany given cycle. One prior art implementation is to implement a w-bytewide data bus, and therefore use “w” CRC calculation blocks of sizes 1byte, 2 bytes, 3 bytes, etc., up to w bytes, to implement the function.In this configuration, data is fed into all of these blocks in parallel.On any given clock cycle, only one of the CRC calculation block outputsis used. That is, in this parallel approach, one and only one CRCcalculation block is selected during each cycle, so the combinatorialpropagation delay will always be equivalent to the delay of one CRCcalculation block.

[0009] It would be highly desirable to provide a structured, iterativeapproach to the CRC calculation circuitry whereby the CRC calculationmay be subdivided into blocks with selectable bus widths which blockscan be cascaded to provide calculation for a bus width of any arbitrarynumber of bytes.

[0010] It would be highly desirable to provide a structured, iterativeapproach to the CRC calculation circuitry maximizes the circuit areareduction for a given target propagation delay.

SUMMARY OF THE INVENTION

[0011] The present invention is an approach for optimizing CRCcalculations based on the realization that the size of CRC calculationblocks is directly proportional to the width of the calculation, andthat by reducing the number of blocks for wide calculation widthsprovides greater savings than reducing the number of blocks for narrowcalculation widths.

[0012] It is thus an object of the invention to provide a structured,iterative approach to the CRC calculation circuitry whereby the CRCcalculation may be subdivided into blocks with selectable bus widthswhich blocks can be cascaded to provide calculation for a parallel buswidth of any arbitrary number of bytes.

[0013] It is a further object of the invention to provide a structured,logarithmically iterative approach to the CRC calculation circuitrywhereby the CRC calculation may be subdivided into blocks withselectable bus widths of the power of two (2) bytes, e.g., 2^(N), e.g.,N=0, 1, . . . , X which blocks can be cascaded to provide calculationfor a bus width of any arbitrary number of bytes.

[0014] It is a further object of the invention to provide a structured,logarithmically iterative approach to the CRC calculation circuitrywhereby the CRC calculation may be subdivided into blocks which allowfor selectable bus widths the values of which are not powers of 2.

[0015] The structured approach to the CRC calculation is carried out byiterative circuitry whereby according to a preferred embodiment, thereis provided a system for generating CRC code words associated with dataranging up to w-bytes in width to be communicated over a communicationschannel including:

[0016] a first plurality of serially coupled code-generation blocks eachfor generating a CRC value based on data input to each block, respectiveblocks of the first plurality configured for receiving data inputshaving respective byte widths ranging from 2^(N)+M to 2^(N−L)+M, where2^(N)+M=w, and M is an offset value, and L is a whole number based on amaximum propagation delay criteria;

[0017] a second plurality of parallel coupled code-generation blockseach for generating a CRC value based on data inputs, respective blocksof the second plurality configured for receiving data having respectivebyte widths ranging from 2^(N−L)â□□1+M to 2⁰; and,

[0018] a means for selecting particular CRC code generation blocks inthe first and second pluralities to be included in a CRC calculationbased on the data input; so that data input bytes of arbitrary width maybe processed.

[0019] According to the principles of the invention, the CRC calculationprocess times in the structured, iterative approach, is optimized basedon the realization that the size of CRC calculation blocks is directlyproportional to the width of the calculation, and that by reducing thenumber of blocks for wide calculation widths provides greater savingsthan reducing the number of blocks for narrow calculation widths.

[0020] Advantageously, for wide data bus widths, a structured,logarithmically iterative approach significantly reduces the amount oflogic required to perform the calculation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] The objects, features and advantages of the present inventionwill become apparent to one skilled in the art, in view of the followingdetailed description taken in combination with the attached drawings, inwhich: FIG. 1 is a block diagram of an overall system architecture inwhich the present invention can operate, formed in accordance with oneembodiment of the present invention.

[0022]FIG. 2 is a block diagram of an overall system architecture inwhich the present invention can operate, formed in accordance with asecond embodiment of the present invention.

[0023]FIG. 3 is a block diagram of an overall system architecture inwhich the present invention can operate, formed in accordance with athird embodiment of the present invention.

[0024]FIG. 4 is a block diagram of an overall system architecture inwhich the present invention can operate, formed in accordance with afourth embodiment of the present invention.

[0025]FIG. 5 is a block diagram of an overall system architecture inwhich the present invention can operate, formed in accordance with ageneric embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0026] The present invention may be more fully understood with referenceto FIG. 1, which shows an overall system architecture according to afirst embodiment of the invention.

[0027] The first embodiment is directed to a structured logarithmicallyiterative approach to the CRC calculation circuitry that provides forthe cascading of CRC calculation blocks, with the number of blocks beingcascaded dependent on the desired width of the calculation.

[0028]FIG. 1 is a block diagram of the CRC calculation circuitry 100according to a first variant of the present invention. According to thisfirst variant, the CRC calculation circuitry 100 is subdivided intoblocks which have logarithmically selected bus widths of 1 byte, 2bytes, 4 bytes, 8 bytes, etc (i.e. powers of two bytes). These blockscan be cascaded to provide calculation for a bus width of any arbitrarynumber of bytes.

[0029] As shown in FIG. 1, the logic circuit 100 includes several CRCcalculation blocks (220, 320, 420), each of which calculates the CRCvalue based on a seed input from the seed multiplexors (130, 230, 330,430), and based on the data input (120). It is understood that theincoming data input path width of w-byte should equal 2^(N). That is,for a w-byte data bus, N=log₂(w), and the circuit includes N+1 CRCcalculation blocks for byte widths corresponding to 2^(N),2^(N−1), . . .2¹, 2⁰. When performing a calculation, the CRC_seed_select signalscontrol the seed multiplexors (230, 330, 430) to select whether arespective CRC calculation block is included in the calculation, or isbypassed. By selectively including or bypassing these blocks, anarbitrary number of bytes may be processed. For instance, to process wbytes, the 2^(N) block is selected and all other blocks are bypassed; toprocess w-1 bytes, the 2^(N) block is bypassed, and blocks 2^(N−1), . .. , 2⁰ are selected; to process w-2 bytes, the 2^(N) and 2⁰ blocks arebypassed, and blocks 2^(N−1), . . . , 2¹ are selected; and so forth. Itis understood that each CRC calculation block is a combinatorial XORtree, the exact design of which is dependent on the CRC polynomial beingimplemented.

[0030] The circuit initially starts with CRC_reset signal asserted suchthat the initial seed value of 0's is selected by multiplexor 130. Theselected CRC calculation is then performed by CRC calculation blocks(220, 320, 420) as selected or bypassed by seed multiplexors (230, 330,430). The output of the CRC calculation at multiplexor 430 is stored inthe CRC result register 110. The CRC register value is updated in eachcycle based on the Data Input (120). Data steering multiplexors (210,310, 410) select the data to be used at each CRC calculation block basedon which blocks are selected. On the last cycle of the packet, the CRCoutput 510 provides the calculated CRC value for the packet todownstream logic.

[0031] In an example embodiment of the invention, there is provided a32-byte wide data input. This N=5 system therefore has 6 CRC calculationblocks of widths 32 bytes, 16 bytes, 8 bytes, 4 bytes, 2 bytes, and 1byte. Nominally, packet data consumes the entire bus width, and the 32byte wide CRC calculation block is selected. However, the packet mayonly consume a portion of the bus width at the beginning and the end ofthe packet transmission. In these cases, CRC_select and Data_selectcontrol signals are generated based on the expected data alignment onthe bus.

[0032] For wide data bus widths, the approach according to the firstvariant significantly reduces the amount of logic required to performthe calculation. (For w=32 bytes, an 88% logic size reduction can berealized). For a w-byte wide data bus, with the number of blocks beingcascaded dependent on the desired width of the calculation, the worstcase propagation delay occurs for a calculation width of w−1, duringwhich log2(w) CRC calculation blocks are cascaded.

[0033] According to this embodiment, the CRC for an incoming data packetwhich is longer than w bytes will be calculated over several clockcycles. Let R represent the number of bytes which must be processed in agiven clock cycle of the calculation. Although R may take arbitraryvalues (Râ□

w) on any clock cycle, the CRC calculation requires some number ofcycles during which R=w, plus one cycle when any remaining bytes are tobe processed.

[0034] Thus, a control stage for the first embodiment which can processR bytes of data Â®â□

w) and wherein the calculation circuitry is comprised of CRC calculationblocks of size 2^(N),2^((N−1)),2^((N−2)),â□¦,4(=2²),2(=2¹) and 1(=2⁰)bytes. If w=32, then there are 32_byte, 16_byte, 8_byte, 4_byte, 2_byte,1_byte blocks. The control logic asserts control signals A_(n−1),A_(n−2) . . . ,A₀ such that M R â□

32; R=(A_(n−1))* 2^((N−1))+(A_(n−2))*2^((N−2))+â□¦.+(A₀)*1. The controlsignals A_(n−1), A_(n−2) . . . ,A₀=1 or 0 to select blocks as specifiedbelow.

[0035] When R=32 then 32_byte or 16_byte+16_byte CRC are used

[0036] When R=31 then 16_byte+8_byte+4_byte+2_byte+1_byte CRC are used

[0037] When R=30 then 16_byte+8_byte+4_byte+2_byte CRC are used

[0038] When R=29 then 16_byte+8_byte+4_byte+1_byte CRC are used

[0039] When R=28 then 16_byte+8_byte+4_byte CRC are used

[0040] When R=27 then 16_byte+8_byte+2_byte+1_byte CRC are used

[0041] When R=26 then 16_byte+8_byte+2_byte CRC are used

[0042] When R=25 then 16_byte+8_byte+1_byte CRC are used

[0043] When R=24 then 16_byte+8_byte CRC are used

[0044] When R=23 then 16_byte+4_byte+2_byte+1_byte CRC are used

[0045] When R=22 then 16_byte+4_byte+2_byte CRC are used

[0046] When R=21 then 16_byte+4_byte+1_byte CRC are used

[0047] When R=20 then 16_byte+4_byte CRC are used

[0048] When R=19 then 16_byte+2_byte+1_byte CRC are used

[0049] When R=18 then 16_byte+2_byte CRC are used

[0050] When R=17 then 16_byte+1_byte CRC are used

[0051] When R=16 then 16_byte CRC is used

[0052] When R=15 then 8_byte+4_byte+2_byte+1_byte CRC are used

[0053] When R=14 then 8_byte+4_byte+2_byte CRC are used

[0054] When R=13 then 8_byte+4_byte+1_byte CRC are used

[0055] When R=12 then 8_byte+4_byte CRC are used

[0056] When R=11 then 8_byte+2_byte+1_byte CRC are used

[0057] When R=10 then 8_byte+2_byte CRC are used

[0058] When R=9 then 8_byte+1_byte CRC are used

[0059] When R=8 then 8_byte CRC is used

[0060] When R=7 then 4_byte+2_byte+1_byte CRC are used

[0061] When R=6 then 4_byte+2_byte CRC are used

[0062] When R=5 then 4_byte+1_byte CRC are used

[0063] When R=4 then 4_byte CRC is used

[0064] When R=3 then 2_byte+1_byte CRC are used

[0065] When R=2 then 2_byte CRC is used

[0066] When R=1 then 1_byte CRC is used

[0067] For each CRC module, 32_byte, â□¦, 1_byte, data of correspondingbyte length will be sent to it. For example: when R=10, then8_byte+2_byte modules are used, and the control signal ‘Data_select’(shown in FIG. 1) is asserted to choose first 8 bytes of data to 8_byteCRC calculation block and last 2 byte of data will send to the 2_byteCRC calculation block.

[0068] The second variant of the invention provides an optimizationbetween the prior art approach and the first variant, i.e.,logarithmically iterative approach. According to this embodiment, areareduction is maximized for a given target propagation delay. This isaccomplished by noting that the size of CRC calculation blocks isdirectly proportional to the width of the calculation. Therefore,reducing the number of CRC calculation blocks for wide calculationwidths provides greater savings that reducing the number of blocks fornarrow calculation widths. At the same time, propagation delays in thecascaded blocks of the logarithmically iterative approach are primarilythrough CRC calculation blocks of narrow width. Thus, by using thelogarithmically iterative approach for wider calculation widths, andusing the parallel approach for smaller calculation widths, anoptimization of timing versus area for the circuit is provided.

[0069] Assume a system for which w=byte width of the data bus. GivenN=log₂ (w) and d_(max)=maximum delay (in units of CRC calculation blockdelays) that is to be permitted, then L=d_(max) â□□1.

[0070] A CRC calculation system according to the second variant is thenconstructed using a logarithmically iterative approach for CRCcalculation block widths of 2^(N−L) and greater, and using a parallelapproach for CRC calculation block widths of less than 2^(N−L). Theresulting system contains CRC calculation blocks for byte widths of2^(N), 2^(N−1), . . . , 2^(N−L)+1, 2^(N−L), 2^(N−L)−1, 2^(N−L)−2 . . . ,2¹, 2⁰. The resulting system contains L+1 CRC calculation blocks in thelogarithmically iterative portion of the system, and 2^(N−L)−1 CRCcalculation blocks in the parallel portion of the circuit.

[0071] The worst case delay through such a system occurs for calculationbyte widths in the range of 2^(N)−1 to 2^(N−1)+1 inclusive. In thisrange there are L cascaded iterative CRC calculation blocks plus oneparallel CRC calculation block through which propagation must occur.

[0072] The second variant of the disclosed invention assumes the targetbyte width (w) of the CRC calculation is a power of 2.

[0073]FIG. 2 is a block diagram of the CRC calculation circuitry 200according to the second variant of the disclosed invention. As shown inFIG. 2, the logic circuit 200 includes several CRC calculation blocks(220, 320, 420, 520, 521), each which calculates the CRC value based onthe seed input from the seed multiplexors (130, 230, 330, 430), andbased on the data input (120). For a system with a w-byte data bus and amaximum delay d_(max), N=log₂(w), and L=d_(max)−1. The logarithmicallyiterative portion of the system includes L+1 CRC calculation blocks forbyte widths corresponding to 2^(N), 2^(N−1), . . . 2^(N−L) (220, 320,420). The parallel portion of the system includes 2^(N−L) CRCcalculation blocks for byte widths corresponding to 2^(N−L)−1, . . . 2⁰(520, 521). When performing a calculation, the CRC_seed₁₃ select signalscontrol the seed multiplexors (230, 330, 430) so as to select whethereach iterative CRC calculation block is included in the calculation, oris bypassed. By selectively including or bypassing these blocks, anynumber of bytes divisible by 2^(N−L) may be processed. In addition,multiplexor 530 selects which of parallel CRC calculation blocks (520,521), if any, is selected to provide the output. This extends theprocessing capability to any arbitrary number of bytes.

[0074] For instance, to process w bytes, the 2^(N) block is selected,all other iterative blocks are bypassed, and multiplexor 530 selects thebypass input; to process w−1 bytes, the 2^(N) block is bypassed, allother iterative blocks (i.e., block 2^(N−1), . . . , 2^(N−L)+1, 2^(N−L))are selected, and multiplexor 530 selects the input from the 2^(N−L)−1block; and so forth. Each CRC calculation block is a combinatorial XORtree, the exact design of which is dependent on the CRC polynomial beingimplemented.

[0075] The circuit initially starts with CRC_reset asserted such thatthe initial seed value of 0's is selected by multiplexor 130. Theselected CRC calculation is then performed by iterative CRC calculationblocks (220, 320, 420) as selected or bypassed by seed multiplexors(230, 330, 430), and by parallel CRC calculation blocks (520, 521) asselected or bypassed by output multiplexor 530. The output of the CRCcalculation at multiplexor 530 is stored in the CRC result register 110.The CRC register value is updated in each cycle based on the Data Input(120). Data steering multiplexors (210, 310, 410) select the data to beused at each iterative CRC calculation block based on which blocks areselected. Data steering multiplexor 510 selects the data to be used bythe selected parallel CRC calculation block. On the last cycle of thepacket, the CRC output 610 provides the calculated CRC value for thepacket to downstream logic.

[0076] In an example embodiment implementing the second variant of theinvention, for a 32-byte wide data bus input, d_(max)=3. This results inan N=5, L=2 system having three (3) iterative CRC calculation blocks ofwidths 32 bytes, 16 bytes, and 8 bytes; and 7 parallel CRC calculationblocks of widths 7 bytes down to 1 byte. Nominally, packet data consumesthe entire bus width, and the 32 byte wide CRC calculation block isselected. However, the packet may only consume a portion of the buswidth at the beginning and the end of the packet transmission. In thesecases, CRC_select and Data_select control signals are generated based onthe expected data alignment on the bus.

[0077] In the timing optimized version according to the second variantof FIG. 2, there are less stages in the critical path than that of FIG.1 in order to achieve better timing. As with the previous embodiment, aCRC calculation for a data packet of length greater than w bytes isperformed over multiple clock cycles. Let R represent the number ofbytes which must be processed in a given clock cycle of the calculation.

[0078] Thus, a control stage for the second embodiment may process Rbytes of data Â® â□

w) and wherein the calculation circuitry is comprised of CRC calculationblocks of size 2^(N),2^((N−1)),2^((N−2)),â□¦,2^((N−L)), 2^((N−L))−1,2^((N−L))−2â□¦ and 1 bytes. If w=32, N=5, and L=2, then there are32_byte, 16_byte, 8_byte, 7_byte, 6_byte, 5_byte, 4_byte, 3_byte, 2_byteand 1_byte CRC stages. The control logic asserts control signalsA_(n−1), A_(n−2), â□¦A_(n−L), B_(K−1),â□¦. B₀ to select processing for Rbytes such that R â□

32, andR=(A_(n−1))*^(2(N−1))+(A_(n−2))*2^((N−2))+â□¦+(A₀)*2^((N−L))+(B_(K−1))*(2^((N−L))−1)+â□¦.+(B₀)*1,where K=2^(N−L). The control signals A_(n−1), A_(n−2), â□¦A_(n−L), â□¦.B₀=1 or 0 to select blocks as specified below.

[0079] When R=32 then 32_byte or 16_byte+6_byte CRC are used

[0080] When R=31 then 16_byte+8_byte+7_byte CRC are used

[0081] When R=30 then 16_byte+8_byte+6_byte CRC are used

[0082] When R=29 then 16_byte+8_byte+5_byte CRC are used

[0083] When R=28 then 16_byte+8_byte+4_byte CRC are used

[0084] When R=27 then 16_byte+8_byte+3_byte CRC are used

[0085] When R=26 then 16_byte+8_byte+2_byte CRC are used

[0086] When R=25 then 16_byte+8_byte+1_byte CRC are used

[0087] When R=24 then 16_byte+8_byte CRC are used

[0088] When R=23 then 16_byte+7_byte CRC are used

[0089] When R=22 then 16_byte+6_byte CRC are used

[0090] When R=21 then 16_byte+5_byte CRC are used

[0091] When R=20 then 16_byte+4_byte CRC are used

[0092] When R=19 then 16_byte+3_byte CRC are used

[0093] When R=18 then 16_byte+2_byte CRC are used

[0094] When R=17 then 16_byte+1_byte CRC are used

[0095] When R=16 then 16_byte CRC is used

[0096] When R=15 then 8_byte+7_byte CRC are used

[0097] When R=14 then 8_byte+6_byte CRC are used

[0098] When R=13 then 8_byte+5_byte CRC are used

[0099] When R=12 then 8_byte+4_byte CRC are used

[0100] When R=11 then 8_byte+3_byte CRC are used

[0101] When R=10 then 8_byte+2_byte CRC are used

[0102] When R=9 then 8_byte+1_byte CRC are used

[0103] When R=8 then 8_byte CRC is used

[0104] When R=7 then 7_byte CRC is used

[0105] When R=6 then 6_byte CRC is used

[0106] When R=5 then 5_byte CRC is used

[0107] When R=4 then 4_byte CRC is used

[0108] When R=3 then 3_byte CRC is used

[0109] When R=2 then 2_byte CRC is used

[0110] When R=1 then 1_byte CRC is used

[0111] The third embodiment of the circuit for performing and timeoptimizing a cyclic redundancy check calculation is directed to astructured logarithmically iterative approach that is more generic,allowing for values of “w” which are not powers of 2.

[0112]FIG. 3 is a block diagram of the CRC calculation circuitry 300according to a third variant of the present invention. According to thisthird variant, the logarithmically iterative portion of the systemincludes L+1 CRC calculation blocks for which the byte widths have beenmore generically assigned as 2^(N)+M, 2^(N−1)+M, . . . 2^(N−L)+M (220,320, 420), where M is a positive offset value (i.e., greater than orequal to zero) comprising an arbitrary constant. This more genericrepresentation according to the third variant permits target byte widths(w) for the system to be values other than powers of 2. Correspondingly,the parallel portion of the system includes 2^(N−L)+M CRC calculationblocks for byte widths corresponding to 2^(N−L)−1+M, . . . ,2⁰ (520,521). As with previously described variants of this invention, whenperforming a calculation, the CRC_seed₁₃ select signals control the seedmultiplexors (230, 330, 430) so as to select whether each iterative CRCcalculation block is included in the calculation, or is bypassed. Byselectively including or bypassing these blocks, any number of bytesdivisible by 2^(N−L) may be processed. In addition, multiplexor 530selects which of parallel CRC calculation blocks (520, 521), if any, isselected to provide the output.

[0113]FIG. 4 is a block diagram of the CRC calculation circuitry 400according to a fourth embodiment of the invention wherein each CRCcalculation block (220, 320, 420, etc.) is able to process S bytes,where S is an arbitrary positive integer. The iterative portion of thecircuit includes “k” blocks, each capable of processing S bytes, suchthat (k+1)* S>w and k*S â□

w. The parallel portion of the system includes B−1 calculation blocksfor byte widths corresponding to S−1, S−2, â□¦1. When performing acalculation, the CRC_seed₁₃ select signals control the seed multiplexors(130, 230, 330, 430) to select a multiple of S-bytes to be processed. Inaddition, multiplexor 530 selects which of parallel CRC calculationblocks (520, 521), if any, is selected to provide the output. Thisextends the processing capability to any arbitrary number of bytes.

[0114] The fourth embodiment reduces logic over that of implementationsusing the second or third embodiments, while still optimizing timingover that of implementations using the first embodiment. Let d representthe delay (in units of CRC calculation blocks) for the circuit. Thenworst case value of d through this circuit occurs for calculations ofx-bytes in the range k*S to w bytes, when k blocks are selected toperform the calculation on the first k*S bytes, and one parallel blockis selected to perform the calculation on remaining bytes, such thatd=k+1. For the case of w=33 bytes, L=2, M=1, k=4, an implementationusing the third embodiment would have a worst case delay d=2 blocks,while this embodiment has a worst case delay d=5 blocks. However thisembodiment reduces logic since the number of larger width CRCcalculation blocks are reduced over that of the second embodiment. Inaccordance with this embodiment, it is typically the case that thenumber of parallel configured CRC calculation blocks be equal to S â□□1,however, for a more aggressive timing scheme, the number of parallelconfigured CRC calculation blocks may exceed S.

[0115] Thus, a control stage for the fourth embodiment can process Rbytes of data Â® â□

w) and wherein the calculation circuitry is comprised of k blocks of Sbytes and S−1 parallel blocks of S−1, S−2, â□¦1 bytes. For example, inaccordance with the fourth variant of the invention depicted in FIG. 4where w=34, and S=5, the control logic asserts control signals A_(k−1),A_(k−2), â□¦A₀, B_(S−1), â□¦. B₀ to select processing for R bytes suchthat R â□

34, andR=(A_(k−1))*S+(A_(k−2))*S+â□¦+(A₀)*S+(B_(S−1))*(S−1)+â□¦.+(B₀)*1. Thecontrol signals A_(k−1), A_(k−2), â□¦A₀, B_(S−1), â□¦. B₀=1 or 0 asspecified below (where X represents the number of blocks of S bytesselected during a given clock cycle, the selection of which is otherwisearbitrary):

[0116] 34 Bytes CRC=S_byte*6+4_byte where X=6 (whole path)

[0117] 33 Bytes CRC=S_byte*6+3_byte where X=6

[0118] 32 Bytes CRC=S_byte*6+2_byte where X=6

[0119] 31 Bytes CRC=S_byte*6+1_byte where X=6

[0120] 30 Bytes CRC=S_byte*6 where X=6

[0121] 29 Bytes CRC=S_byte*5+4_byte where X=5

[0122] 25 Bytes CRC=S_byte*5 where X=5

[0123] 24 Bytes CRC=S_byte*4+4_byte where X=4

[0124] 23 Bytes CRC=S_byte*4+3_byte where X=4

[0125] 22 Bytes CRC=S_byte*4+2_byte where X=4

[0126] 21 Bytes CRC=S_byte*4+1_byte where X=4

[0127] 20 Bytes CRC=S_byte*4 where X=4

[0128] 19 Bytes CRC=S_byte*3+4_byte where X=3

[0129] 18 Bytes CRC=S_byte*3+3_byte where X=3

[0130] 17 Bytes CRC=S_byte*3+2_byte where X=3

[0131] 16 Bytes CRC=S_byte*3+1_byte where X=3

[0132] 15 Bytes-CRC=S_byte*3 where X=3

[0133] 14 Bytes CRC=S_byte*2+4_byte where X=2

[0134] 13 Bytes CRC=S_byte*2+3_byte where X=2

[0135] 12 Bytes CRC=S_byte*2+2_byte where X=2

[0136] 11 Bytes CRC=S_byte*2+1_byte where X=2

[0137] 10 Bytes CRC=S -byte*2 where X=2

[0138] 9 Bytes CRC=S_byte+4_byte where X=1

[0139] 8 Bytes CRC=S_byte+3_byte where X=1

[0140] 7 Bytes CRC=S_byte+2_byte where X=1

[0141] 6 Bytes CRC=S_byte+1_byte where X=1

[0142] 5 Bytes CRC=S_byte where X=1

[0143] 4 Bytes CRC=4_byte where X=0

[0144] 3 Bytes CRC=3_byte where X=0

[0145] 2 Bytes CRC=2_byte where X=0

[0146] 1 Bytes CRC=1_byte where X=0

[0147]FIG. 5 provides a generic form of the CRC calculation circuitry450 encompassing all four variants of the invention described withrespect to FIGS. 1-4, with CRC calculation block byte widths ranging insize from F_(x), F_(x−1), â□¦F₁, where F_(x)â□¥F_(x−1)â□¥â□¦â□¥F₁â□¥ 0bytes for the iterative portion of serially cascaded blocks (220, 320,420, etc.); and, CRC calculation block byte widths of sizes Gy−i, bytesfor the parallel blocks (520, 521, etc.) where G_(y−1)=G_(y−i−1)+1 fori=0 to y−1 where y â□¥F₁ â□□1. According to the generic form, the numberof bytes included in the CRC calculation can be expressed according tothe following relation: R=A_(x)*F_(x)+A_(x−1)*F_(x)−1+â□¦.+A₁*F₁+B_(y)*G_(y)+â□¦+B₁*G₁, where x is the number of F_(x) blocks andy is the number of G_(y) blocks and A_(x), A_(x−1), â□¦A₁, B_(y), â□¦B₁−0 or 1 as described in accordance with one of the four embodiments.

[0148]FIG. 5 depicts the generic form of all variants of the invention.For the first embodiment: F_(x)=2_(N), F_(x−1)=2^(N−1), â□¦. F₁=2⁰; andy=F₁−1=0 such that there are no CRC calculation blocks in the parallelportion of the circuit. For the second embodiment: F_(x)=2^(N),F_(x−1)=2^(N−1), â□¦. F₁=2^(N−L), G_(y)−2^(N−L)−1, â□¦G=1. For the thirdembodiment: F=2^(N)+M, F−1=2^(N−1)+M, â□¦. F=2^(N−L)+M,G_(y)=2^(N−L)+M−1, â□¦G₁=1. For the fourth embodiment:F_(x)=F_(x−1)=â□¦=F₁=S; and G_(y)=y−1, â□¦., G₁=1 where y â□¥S â□□1.

[0149] While the invention has been particularly shown and describedwith respect to illustrative and preferred embodiments thereof, it willbe understood by those skilled in the art that the foregoing and otherchanges in form and details may be made therein without departing fromthe spirit and scope of the invention that should be limited only by thescope of the appended claims.

What is claimed is:
 1. An architecture for generating an error controlcode appended to a data packet having a given byte width that is to betransmitted over a communications link having a parallel data width of2^(n) bytes, comprising a plurality of code-generation blocks, a firstone of said code-generation blocks generating said code for 2^(n) bytes,and a second one of said code-generation blocks generating said code for2^(n−1) bytes, wherein said code from said first code-generating blockis selectively coupled to said second code generating block, and saidfirst and second code-generation blocks are selectively enabled, for agiven data packet depending on said given byte width of said given datapacket.
 2. An architecture for generating an error control code appendedto a data packet having a given byte width that is to be transmittedover a communications link having a parallel data width of 2^(n) bytes,comprising: a first plurality of serially coupled code-generationblocks, a first one of said first plurality of code-generation blocksgenerating said code for 2^(n) bytes, and a second one of said firstplurality of code-generation blocks generating said code for 2^(n−1)bytes, wherein said code from said first code-generating block isselectively coupled to said second code generating block; and a secondplurality of parallel coupled code-generation blocks, a first one ofsaid second plurality of code-generation blocks generating said code for2^(x) bytes, and a second one of said code-generation blocks generatingsaid code for 2^(y) bytes, x and y being different from n and n−1;wherein one or more of said first plurality of serially coupledcode-generation blocks, none or one of said parallel coupled codegeneration blocks, are selectively enabled for a given data packet,depending on said given byte width of said given data packet.
 3. Acircuit for generating CRC code words associated with bytes of data tobe transmitted over a communications channel, said communicationschannel capable of transmitting data up to w-bytes in width, saidcircuit comprising: a first plurality of serially coupledcode-generation blocks each for generating a CRC value based on datainput to said block, respective blocks of said first plurality ofcode-generation blocks configured for receiving data inputs havingrespective byte widths ranging from 2^(N)+M, 2^(N−1)+M, . . . ,2^(N−L)+M where w=2^(N)+M, M is an offset value, and L is a whole numberbased on a maximum propagation delay criteria for processing CRC valuesin said first plurality; a second plurality of parallel coupledcode-generation blocks each for generating a CRC value based on datainput to said block, respective blocks of said second plurality ofcode-generation blocks configured for receiving data inputs havingrespective byte widths ranging from 2^(N−L)−1+M, 2^(N−L)−2+M, . . . ,2⁰; a multiplexor means controllable for selecting particular CRC codegeneration blocks in said first and second pluralities to be included ina CRC calculation based on said data input; wherein by selectivelyincluding or bypassing CRC code generation blocks any number of datainput bytes divisible by 2^(N−L) may be processed for corresponding CRCcode generation.
 4. The CRC code generating circuit as claimed in claim3, wherein M â□¥0.
 5. The CRC code generating circuit as claimed inclaim 3, wherein N is equal to log₂(w).
 6. A circuit for generating CRCcode words associated with bytes of data to be transmitted over acommunications channel, said communications channel capable oftransmitting data up to w-bytes in width, said circuit comprising: afirst plurality of serially coupled code-generation blocks each forgenerating a CRC value based on data input to said block, respectiveblocks of said first plurality of code-generation blocks configured forreceiving data inputs having respective byte widths of two's powerranging from 2^(N) to 2^(N−L), where N is equal to log₂(w) and L is awhole number based on a maximum propagation delay criteria forprocessing CRC values in said first plurality; a second plurality ofparallel coupled code-generation blocks each for generating a CRC valuebased on data input to said block, respective blocks of said secondplurality of code-generation blocks configured for receiving data inputshaving respective byte widths ranging from 2^(N−L) â□□1 to 2⁰; amultiplexor means controllable for selecting particular CRC codegeneration blocks in said first and second pluralities to be included ina CRC calculation based on said data input; wherein by selectivelyincluding or bypassing CRC code generation blocks any number of datainput bytes divisible by 2 ^(N−L) may be processed for corresponding CRCcode generation.
 7. The CRC code generating circuit as claimed in claim6, wherein said first plurality of serially coupled code-generationblocks comprises a quantity L+1, and said second plurality of parallelcoupled code-generation blocks comprises a quantity 2^(N−L)â□□1.
 8. TheCRC code generating circuit as claimed in claim 6, wherein said firstmaximum propagation delay criteria for processing CRC values in saidfirst plurality of serially coupled code-generation blocks comprises amaximum delay, d_(max), of clock cycles, wherein said whole numberL=d_(max)−1.
 9. A circuit for generating CRC code words associated withbytes of data to be transmitted over a communications channel, saidcommunications channel capable of transmitting data up to w-bytes inwidth, said circuit comprising: a first plurality of serially coupledcode-generation blocks each for generating a CRC value based on datainput to said block, each block of said first plurality ofcode-generation blocks configured for receiving data inputs having abyte width of S bytes; a second plurality of parallel coupledcode-generation blocks each for generating a CRC value based on datainput to said block, respective blocks of said second plurality ofcode-generation blocks configured for receiving data inputs havingrespective byte widths ranging from y bytes to 1 byte; a multiplexormeans controllable for selecting particular CRC code generation blocksin said first and second pluralities to be included in a CRC calculationbased on said data input; wherein a time for calculating a particularCRC code value in said first plurality of serially coupledcode-generation blocks summed with time for processing a particular CRCcode value in said second plurality of parallel coupled code-generationblocks is less than or equal to a single clock period.
 10. The CRC codegenerating circuit as claimed in claim 9, wherein y=S−1, said bytewidths of said second plurality ranging consecutively from S−1, S−2, . .. 1 bytes.
 11. The CRC code generating circuit as claimed in claim 9,wherein y>S.
 12. The CRC code generating circuit as claimed in claim 9,wherein said first plurality of serially coupled code-generation blocksincludes k blocks, each block capable of processing S bytes, such that(k+1)*S>w and k*S â□

w.
 13. A circuit for generating CRC code words associated with bytes ofdata to be transmitted over a communications channel, saidcommunications channel capable of transmitting data up to w-bytes inwidth, said circuit comprising: a first plurality of serially coupledcode-generation blocks each for generating a CRC value based on datainput to said block, respective blocks of said first plurality ofcode-generation blocks configured for receiving data inputs ranging frombyte widths respectively of F_(x), F_(x-)1, â□¦F₁, whereF_(x)â□¥F_(x−1)â□¥â□¦â□¥F₁ â□¥ 0 bytes; a second plurality of parallelcoupled code-generation blocks each for generating a CRC value based ondata input to said block, respective blocks of said second plurality ofcode-generation blocks configured for receiving data inputs havingrespective byte widths G_(y−i) where G_(y−i)=G_(y−i)−1+1 for i=0 to y−1,where y â□¥F₁â□□1; a multiplexor means controllable for selectingparticular CRC code generation blocks in said first and secondpluralities to be included in a CRC calculation based on said datainput; wherein by selectively including or bypassing CRC code generationblocks any number of data input bytes may be processed for correspondingCRC code generation according to a relation: A _(x) *F _(x) +A _(x−1) *F_(x−1) +â□¦. +A ₁ *F ₁ +B _(y) *G _(y) +â□¦+B ₁ *G ₁, where A _(x) , A_(x−1) , â□¦A ₁ , B _(y) , â□¦B ₁=0 or
 1. 14. The CRC code generatingcircuit as claimed in claim 13, wherein said F_(x)=2^(N), F=2^(N−1),â□¦. F=2⁰; and y=F−1=0.
 15. The CRC code generating circuit as claimedin claim 13, wherein said F_(x)=2^(N), F_(x−1)=2^(N−1), â□¦. F₁=2^(N−L),G_(y)â□¥2^(N−L) −1, â□¦G₁=1; and L is a whole number based on a maximumpropagation delay criteria for processing CRC values in said firstplurality.
 16. The CRC code generating circuit as claimed in claim 13,wherein said F_(x)=2^(N)+M, F_(x−1)=2^(N−1)+M, â□¦. F₁=2^(N−L)+M,G_(y)â□¥2^(N−L)+M−1, â□¦G₁=1.
 17. The CRC code generating circuit asclaimed in claim 13, wherein said F_(x)=F_(x−1)=â□¦=F₁=S; and G_(y)=y−1,â□¦., G₁=1 where y â□¥S â□□1.